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1  Introduction 


Correspondence  is  a  process  of  relating  information  in  one  image  to  its  equivalent  in 
others.  Using  correspondence  a  vision  system  can  infer  the  3-D  structure  of  the  observed 
scene,  an  inference  that  is  significantly  more  difficult  to  make  from  a  single  2-D  image. 
The  establishment  of  correspondence  is  itself  a  difficult  task.  Various  methods  were 
developed  in  recent  years  to  achieve  correspondence  for  stereo  vision  (e.g.,  [Marr  and 
Poggio  1979,  Crimson  1980,  Baker  and  Binford  1981]),  motion  analysis  (e.g.,  [Koenderink 
and  Van  Doom  1975,  Ullman  1979,  Longuet-Higgins  1981,  Hildreth  1984]),  and  object 
recognition  (e.g.,  [Fischler  and  Bolles  1981,  Crimson  and  Lozano-Perez  1984,  Lowe  1987, 
Huttenlocher  and  Ullman  1987]). 

In  motion  analysis  the  stage  of  establishing  correspondence  is  usually  viewed  as  in¬ 
dependent  of  the  stage  of  shape  recovery  [Ullman  1978].  According  to  this  view,  the 
correspondence  is  determined  so  as  to  minimize  the  observed  2-D  motion  along  the  im¬ 
age  sequence.  No  assumptions  are  made  at  this  stage  with  respect  to  the  shape  of  the 
moving  objects  or  to  the  transformations  they  undergo.  In  this  way  correspondence  can 
be  found  even  for  objects  that  undergo  non  rigid  transformations,  and  when  the  images 
contain  a  number  of  objects  moving  differently. 

The  distinction  between  the  two  processes  of  correspondence  and  shape  recovery  is 
useful  when  the  motion  between  the  frames  is  relatively  small,  in  which  case  a  mini¬ 
mization  process  can  resolve  the  correspondence  correctly.  When,  however,  “long  range 
motion”  is  considered,  minimization  techniques  often  fail  to  find  the  correct  correspon¬ 
dence.  Information  about  the  transformation  may  be  used  in  these  cases  to  guide  the 
process  of  establishing  correspondence. 

An  important  application  that  requires  correspondence  under  “long  range  motion” 
conditions  is  the  construction  of  3-D  representations  for  object  recognition.  In  this  process 
shape  information  is  accumulated  over  time  until  a  complete  model  is  constructed  for 
the  object.  During  this  period  the  object  may  be  observed  in  positions  that  significantly 
differ  from  one  another.  Yet,  it  is  desired  for  this  process  to  tolerate  such  differences. 

Correspondence  is  not  only  useful  for  constructing  object-centered  models,  but  also 
for  viewer-centered  ones.  Recognition  schemes  that  use  viewer-centered  representations 
were  recently  developed.  In  [Ullman  and  Basri  1991]  an  object  is  represented  by  a  small 
number  of  its  2-D  images  together  with  the  correspondence  between  the  images.  The 
appearance  of  an  object  from  different  viewpoints  is  predicted  by  the  linear  combinations 
of  its  model  images.  These  predictions  are  exact  for  rigid  objects.  Similar  representations 
were  used  by  Poggio  and  Edelman  [1990].  Their  approach  approximates  the  appearance 
of  objects  from  arbitrary  viewpoints  using  radial  basis  functions. 

Point-to-point  correspondence  between  images  is  therefore  crucial  for  constructing 
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both  object-centered  ais  well  as  viewer- centered  representations.  In  the  object-centered 
case  this  follows  from  the  fact  that  structure  from  motion  algorithms  require  full  corre¬ 
spondence  between  the  images.  Once  the  correspondence  is  known  structure  recovery  is 
fairly  straightforward.  In  the  viewer-centered  case  full  correspondence  between  images 
provides  implicit  information  about  the  depth  values  of  the  points.  The  stability  of  a  rep¬ 
resentation,  measured  by  the  errors  induced  when  the  appearance  of  the  modeled  object 
is  predicted  from  arbitrary  viewpoints,  tends  to  increase  as  the  images  used  to  construct 
the  models  are  taken  from  viewing  angles  that  are  relatively  distant  from  one  another. 

One  assumption  that  is  generally  used  in  different  vision  applications  such  as  motion 
and  object  recognition  is  that  the  objects  observed  are  rigid.  Lee  and  Huang  [1990]  have 
recently  addressed  the  question  of  how  rigidity  affects  the  solution  to  the  correspondence 
problem.  They  showed  that  under  an  orthographic  projection  the  correspondence  to 
points  can  only  be  determined  up  to  straight  lines  (known  as  “the  epipolar  lines  of  the 
points”),  and  that  four  corresponding  points  determine  the  position  of  these  lines.  They 
did  not  specify  any  method  to  resolve  the  correspondence  within  the  lines. 

The  epipolar  line  idea  is  not  new.  It  is  extensively  used  in  stereopsis,  but  rarely  used 
in  establishing  correspondence  in  motion  analysis.  BoUes  and  Baker  [1985]  used  epipo¬ 
lar  lines  to  analyze  motion  sequences  obtained  by  a  translation  along  a  straight  line. 
Yachida  [1986]  and  Ayache  and  Lustman  [1987]  used  it  in  developing  their  trinocular 
stereovision  algorithm.  In  this  paper  we  examine  the  use  of  epipolar  lines  in  establishing 
correspondence  for  depth  reconstruction.  In  the  first  part  of  this  paper  (Section  2)  we 
review  the  theory  behind  epipolar  lines  and  how  to  compute  them  from  a  small  number  of 
corresponding  points.  The  formulation  we  use  is  somewhat  different  from  that  presented 
by  Lee  and  Huang  [1990],  and  we  analyze  the  similarity  and  the  differences  betweer  or¬ 
thographic  and  perspective  projection  models.  We  show  that  epipolttr  lines  exist  even  in 
more  complicated  situations,  such  as  when  an  object  undergoes  a  general  linear  transfor¬ 
mation  (including  stretch  and  shear),  and  when  objects  with  smooth  bounding  surfaces 
are  considered.  In  the  second  part  of  this  paper  (Section  3)  we  show  that  the  correspon¬ 
dence  is  not  determined  uniquely  even  when  three  or  more  images  are  given.  Additional 
images  can,  however,  be  used  to  heuristically  resolve  the  correspondence  [Yachida  1986]. 
We  have  applied  this  method  to  arbitrarily  curved  images,  and  used  the  results  to  con¬ 
struct  object  models  for  recognition.  In  addition  we  discuss  how  epipol2u:  lines  can  be 
used  to  solve  the  aperture  problem. 


2  Correspondence  from  Two  Images 

The  correspondence  problem  discussed  below  is  defined  as  follows.  Given  a  pair  of  2-D 
images,  for  every  point  in  space  that  is  projected  to  both  images  find  its  location  in  the  two 
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images.  Often  only  feature  points  (such  as  contour  points)  are  considered.  We  exaunine 
this  problem  assuming  the  images  differ  by  a  rigid  transformation.  We  consider  two 
projection  models,  orthographic  projection  (with  a  uniform  scale  factor  to  compensate 
for  depth  changes)  and  perspective  projection.  We  begin  our  discussion  by  introducing 
general  properties  for  both  projection  models,  and  later  prove  these  properties  for  each 
of  the  models  separately.  Finally,  we  extend  these  properties  to  more  complicated  cases, 
such  as  objects  that  undergo  general  affine  transformations  (rather  than  rigid  ones)  and 
objects  with  smooth  bounding  surfaces. 

Our  analysis  consists  of  three  steps; 

1.  We  show  that  rigidity  divides  the  images  into  sets  of  epipolar  lines.  Their  corre¬ 
spondence  is  determined  by  the  transformation  that  separates  the  images,  but  the 
correspondences  of  points  along  the  lines  cannot  be  determined. 

2.  The  epipolar  lines  can  be  recovered  from  a  small  set  of  corresponding  points,  four 
in  the  orthographic  case  and  seven  in  the  perspective  case. 

3.  These  results  apply  also  to  objects  that  undergo  general  affine  transformation  and 
to  objects  with  smooth  bounding  surfaces. 

Proposition  1  establishes  that  in  a  pair  of  images  related  by  a  rigid  transformation  a 
point  in  one  image  can  potentially  match  in  the  second  image  any  point  that  lies  along 
a  straight  line  (which  is  referred  to  zis  “the  epipolar  line  of  that  point”). 

Let  Pi  and  P2  be  two  projections  (either  orthographic  or  perspective)  of  a  rigid  object 
from  two  given  viewpoints.  Let  (x,y)  be  the  projection  of  some  object  point  in  Pi. 

Proposition  1;  The  corresponding  point  to  {x,y)  in  P2  lies  along  a  straight  line 
given  by: 

(a:',y')  =  u-|-a(x)v 

where  u,v  €  are  constants  (namely,  independent  of  z),  and  a  is  a  scalar  function  of 

2. 

Following  Proposition  1,  given  the  transformation  that  relates  the  two  images,  the 
correspondence  is  determined  up  to  a  straight  line.  The  vectors  u  and  v  are  determined 
both  by  the  transformation  and  by  the  2-D  position  of  the  point  (x,  y),  while  a  is  the  only 
component  that  depends  on  z,  the  depth  value  of  the  point  in  3-D.  There  is  a  one-to-one 
mapping  between  the  position  of  p  along  the  epipolar  line  in  Pj  and  its  depth  vaJue. 
Every  different  depth  value  corresponds  to  a  different  location  of  p  along  the  epipolar 
line,  and  every  different  location  along  the  epipolar  line  determines  a  different  depth 
value. 
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In  some  cases  epipolar  lines  vanish  and  point  correspondence  is  uniquely  determined. 
This  occurs  in  the  degenerated  case  when  v  =  0.  In  this  case  the  position  (x',  y')  does 
not  depend  on  the  depth  value  of  the  point.  Under  orthographic  projection  this  occurs 
when  the  object  is  rotated  around  the  line  of  sight  and  then  translated  arbitrarily.  Under 
perspective  projection  v  vanishes  when  the  object  is  rotated  around  the  camera. 

The  epipolar  lines  axe  parallel  in  the  orthographic  case,  since  in  this  case  v  depends 
solely  on  the  transformation  and  is  therefore  common  to  all  object  points.  This  is  not 
always  true  in  the  perspective  case.  In  this  case  the  epipolar  lines  are  parallel  only  if  the 
transformation  includes  no  translation  in  depth.  If,  however,  tz  ^  0  the  epipolar  lines 
coincide  at  a  single  point  known  as  the  focus  of  expansion. 

A  rigid  transformation  divides  the  image  into  epipolar  lines  within  which  correspon¬ 
dence  cannot  be  determined.  Every  epipolar  line  in  one  image  has  its  corresponding 
epipolar  line  in  the  second  image,  in  the  sense  that,  all  the  points  that  lie  along  some 
epipolar  line  in  the  first  image  share  the  same  epipolar  line  in  the  second  image  and  vice 
versa.  This  is  established  in  Proposition  2. 

Proposition  2:  Let  pi,p2  €  Pt  be  two  points  that  lie  along  some  common  epipolar 

line.  The  epipolar  line  of  pi  and  the  epipolar  line  of  pj  in  Pj  coincide. 

Since  rigidity  alone  does  not  determine  the  correspondence  except  up  to  epipolar 
lines,  it  may  in  some  cases  be  sufficient  to  recover  the  epipolar  lines  rather  than  all  the 
parameters  of  the  transformation  that  separates  the  two  images.  Interestingly,  under 
orthographic  projection  the  epipolar  lines  are  determined  from  two  images  while  the 
transformation  is  not.  Four  non  coplanar  points  are  required  for  this  task.  The  trans¬ 
formation  breaks  up  into  its  planar  parts  and  non  planar  parts.  The  planar  parts  of 
the  transformation  au’e  determined  by  the  epipolar  lines,  while  the  non  planar  parts,  the 
rotation  in  depth,  cannot  be  recovered.  In  the  perspective  case  both  the  epipolar  lines 
and  the  transformation  are  determined  from  two  images.  In  this  case  seven  points  are 
required. 

The  results  above  apply  in  two  additional  cases  that  extend  beyond  the  set  of  rigid 
transformations.  Epipolar  lines  exist  when  the  objects  considered  undergo  general  3-D 
affine  transformation,  which  includes  stretch  and  shear.  The  same  applies  under  ortho¬ 
graphic  projection  to  objects  with  smooth  bounding  surfaces.  In  this  case  the  contours 
change  their  |>08ition  on  the  object  with  the  viewpoint.  (See  a  discussion  in  [Basri  and 
Ullman  1988].)  This  motion  is  projected  along  epipolar  lines  (See  section  2.3  below).  In 
both  cases,  corresponding  points  lie  along  epipolar  lines,  and  these  epipolar  lines  can  be 
recovered  from  a  small  set  of  corresponding  points. 
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2.1  Orthographic  Projection 


In  this  section  we  repeat  the  results  presented  in  the  beginning  of  Section  2  and  prove 
them  for  the  orthographic  case.  Let  Pi  and  be  two  images  of  a  rigid  object  from  two 
arbitrary  viewpoints.  Let  p  =  {x,y,z)  be  an  object  point,  its  position  in  Pi  is  given  by 
(x,y),  and  its  position  in  P2  is  given  by  (x',2/')  which  is  the  orthographic  projection  of 
sRp  1,  where  s  is  a  scale  factor,  i?  is  a  3  x  3  rotation  matrix,  and  t  is  a  translation 
vector. 

In  the  following  analysis  we  assume  that  the  transformation  between  the  images 
(namely,  s,  R,  and  t)  is  known.  We  select  a  point  (i,  y)  in  the  first  image  and  compute 
its  possible  positions  in  the  second  image.  We  show  that  the  set  of  these  positions  forms 
a  straight  line,  and  that  the  exact  position  along  this  line  is  determined  by  its  depth 
value. 

Proposition  la:  Given  a  rigid  transformation  defined  by  {s,/?,t}  emd  a  point 

(x,y)  €  Pi,  its  corresponding  point  in  Pj  lies  along  the  epipolar  line  given  by: 

(i',y')  =  u  +  2v 


where  \i,v  €  TV  are  constants. 

Proof:  Denote  Vij,  (1  <  i,j  <  3)  the  elements  of  R,  f*  and  ty  the  horizontal  and 

vertical  components  of  the  translation,  since 

(  ^'  ]  =  f  +  ^3^)  +  ^ 

V  y  /  \  5(^213:  +  r22y  +  r23z)  +  ty  J 

we  define 

u  _  /  sriix  +  sri2y  +  tx 

\  3r2ix  +  sr22y  +  ty 

V  =  (^’•■0  . 

V  3X23  J 

Notice  that  since  the  transformation  is  given,  u  and  v  are  determined  for  a  particular 
point  (x,y),  and  consequently  its  corresponding  point  lies  along  the  straight  line  u  +  zv. 
When  the  depth  value,  z,  of  the  point  is  given,  the  location  of  the  corresponding  point 
along  the  line  is  determined,  and  vice  versa,  selecting  a  corresponding  point  along  the 
line  determines  its  depth  value.  When  v  =  0  the  epipolar  line  vanishes  into  a  point. 
In  this  case  the  images  are  separated  by  a  rotation  about  the  line  of  sight  (plus  some 
arbitrary  translation).  For  symmetry  reasons  we  obtain  the  same  results  for  points  in 
the  second  image,  namely,  that  their  corresponding  points  in  Pi  lie  along  straight  lines. 
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The  epipolar  lines  in  each  of  the  images  are  parallel.  This  follows  from  the  fact  that 
V  depends  solely  on  the  trMsformation,  and  therefore  has  a  common  value  for  all  image 
points.  All  the  points  in  Pi  that  lie  along  a  single  epipolar  line  share  the  same  epipolar 
line  in  P2.  This  is  established  in  the  following  Proposition. 

Proposition  2a:  Let  pi,p2  €  Pi  be  two  points  that  lie  along  some  common  epipolar 

line.  The  epipolar  line  of  pi  and  the  epipolar  line  of  pj  in  P2  coincide. 

Proof:  All  the  epipolar  lines  are  parallel.  According  to  the  definition  of  an  epipolar 

line,  since  pi  and  pj  lie  along  a  single  epipolar  line,  both  are  possible  matches  of  a 
single  point,  q,  in  Pj.  Therefore,  the  epipolar  lines  of  pi  and  p2  intersect  in  q,  and  since 
epipolar  lines  are  parallel  they  must  coincide.  Consequently,  rigidity  determines  the 
correspondence  between  epipolar  lines,  but  does  not  resolve  the  correspondence  within 
these  lines. 

When  only  two  images  are  given  the  transformation  cannot  be  fully  recovered.  The 
epipolar  lines,  however,  can  be  recovered  using  a  correspondence  set  of  four  non  coplanar 
points.  A  linear  equation  from  which  the  epipolar  lines  can  be  computed  is  given  below. 
We  shall  use  the  following  notation.  Let  (x,,  j/,)  €  Pi  and  (x',y9  €  P2  be  a  pair  of 
corresponding  points,  namely,  they  are  the  projections  of  a  common  point  in  3-D  space. 
Pi  =  {xi,yi,Zi).  We  shall  have  n  such  correspondences.  (To  solve  this  equation  n  must 
be  >  4.)  Denote  x  =  (xi,...,x„),  y  =  (yi,...,j/n),  z  =  (^i, ^n),  x'  =  (x;, ...,x'„), 
y*  =  iyi^’—iVn),  and  1  =  (!,...,!)  €  "R”.  According  to  [Ullman  and  Basri  1991],  x,  y, 
z»  x',  y',  and  1  are  all  embedded  in  a  4-D  linear  space.  This  follows  from  the  identities 
below 


x'  =  sruX  -i-  sri2y  -f  sriaz  -|- 

y  =  sr2iX  +  sr22y  +  sr23Z  +  tyl 

Consequently,  {x,y,  z,  1}  span  a  4-D  linear  space  to  which  x'  and  y  also  belong.  There¬ 
fore,  there  exist  nonzero  scalars  ci,  02,  61,  62,  and  c  such  that: 

oix  -I-  a2y  -I-  6ix'  +  i>2y'  +  cl  =  0 

These  coefficients  are  determined  (up  to  a  scale  factor)  by  four  non  coplanar  points.  The 
epipolar  line  are  immediately  derived  from  this  equation.  (This  result  is  proved  somewhat 
differently  in  [Huang  and  Lee  1989,  Lee  and  Huang  1990].) 

The  epipolar  lines  break  the  transformation  that  relates  the  images  into  its  planar 
components  and  its  non  planar  ones.  The  planar  components  can  be  recovered  from  the 
epipolar  lines,  while  the  non  planar  ones  cannot  be  determined  from  two  images.  The 
translation  component  perpendicular  to  the  epipolar  line  is  given  by  c.  (The  translation 
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components  can  be  discarded  altogether  if  we  consider  differences  between  points  rather 
than  the  points  themselves.)  The  values  of  the  other  coefficients  are  given  below. 


fll  =  5^32 

02  =  — 

=  ^”23 

62  =  —’”13 


The  scale  factor  is  therefore  given  by  the  ratio 


s 


aj  +  al 

N  + 


The  relative  angle  between  the  epipolar  lines  determines  the  planar  parts  of  the  rotation, 
as  explained  below.  A  3-D  rotation  can  be  decomposed  into  a  sequence  of  three  successive 
rotations:  a  rotation  about  the  Z-axis  by  an  angle  a,  a  second  rotation  about  the  Y- 
axis  by  an  angle  and  a  third  rotation  about  the  Z-axis  by  an  angle  7.  Under  this 
decomposition  the  following  identities  hold 


’”32  = 

’”31  = 

’’23  = 

ru  = 

We  therefore  obtain  that 

Q  = 

7  = 

while  cannot  be  determined. 

We  can  visualize  this  decomposition  in  the  following  way.  After  compensating  for 
the  translation  and  scale  changes,  we  first  rotate  the  image  Pi  by  a.  Consequently,  the 
epipolar  lines  point  in  Pi  to  a  horizontal  direction.  We  then  rotate  the  second  image,  P2, 
by  —7.  As  a  result,  the  epipolar  lines  in  P2  also  point  horizontally.  The  images  obtained 
are  related  by  a  rotation  about  the  vertical  axis,  which  is  a  rotation  in  depth.  Following 
such  a  rotation  the  points  move  horizontally,  which  is,  along  the  (rotated)  epipolar  lines. 
This  motion  cannot  be  recovered  since  it  depends  both  on  the  angle  of  rotation,  /5,  and 
on  the  depth  of  the  points. 


sin  a  sin  0 
—  cos  a  sin  0 
sin  0  sin  7 
sin  0  cos  7  . 


*  -1 

tan  — 

02 
,  -1 

-tan  -- 
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An  essenti2illy  similar  break  up  of  the  transformation  was  suggested  by  Ullman  [1983]. 
In  his  proof,  however,  a  correspondence  set  of  five  points  was  required  to  recover  the 
planar  peirts  of  the  transformation.  We  can  see  here  that  four  non  coplanar  points  are 
sufficient,  since  the  epipolar  lines  can  be  recovered  from  four  such  points,  and  the  break 
up  is  completely  described  by  the  epipolar  lines. 


2.2  Perspective  Projection 


In  this  section  we  repeat  the  results  presented  in  the  beginning  of  Section  2  and  prove 
them  for  the  perspective  case.  We  use  the  following  notation.  An  object  point  p  is 
denoted  by  (zx^zy^z).  It  is  projected  in  Pi  to  the  position  (x,y)  and  in  P2  to  (x',y'). 
(There  the  actual  3-D  position  of  the  point  is  denoted  by  {z*x',z'y',z').) 

Proposition  lb:  Given  a  rigid  transformation  that  includes  a  rotation  R  and  a 

translation  t,  and  given  a  point  (x,  1/)  €  Pi,  its  corresponding  point  in  P2  lies  along  the 
epipolar  line  given  by 

(x',y')  =  u  +  a{z)v 

where  u,  v  €  '72.^  are  constants,  and  a  is  a  scalar  function  of  z. 

Proof;  Denote 

Xr  \  I  X 

Vr  \  =  R  y 

Zr)  V  1 

Note  that  x^,  y^,  and  Zr  are  all  independent  of  2.  Now,  since 


we  obtain  that 


and  so  we  define 
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Parallel  epipolar  lines  are  obtained  when  =  0.  In  this  case  v  is  independent  of  the 
position  of  the  point  and  depends  solely  on  the  transformation.  If,  however,  ^  0  the 
epipolar  lines  intersect  in  one  point,  called  the  focus  of  expansion.  This  point  stands  for 
z  =  0,  and  its  location  in  is  given  by 


The  location  of  the  focus  of  expansion  in  T'l  corresponds  to  the  case  when  v  =  0.  This 
condition  implies  the  following  linear  equation  system 

from  which  this  location  can  be  retrieved.  (Recall  that  x,,  Pr,  and  Zy  are  linear  functions 
of  X  and  y.) 

Similar  to  the  orthographic  case,  points  that  lie  on  a  common  epipolar  line  in  one 
image  share  the  same  epipolar  line  in  the  other. 

Proposition  2b:  Let  pi,  pa  €  Pi  be  two  points  that  lie  along  some  common  epipolar 

line.  Assume  both  pi  and  pa  are  not  the  focus  of  expansion.  The  epipolar  line  of  pi  and 
the  epipolar  line  of  pa  in  Pa  coincide. 

Proof:  If  =  0  the  epipclar  lines  are  parallel  and  the  proof  is  identical  to  that  of 

the  orthographic  case.  If  tz  ^  0  the  epipolar  lines  in  each  image  intersect  in  the  focus  of 
expansion.  Since  the  points  lie  along  a  common  epipolar  line  in  Pi  there  exists  a  point  q 
in  Pa  that  is  a  possible  match  to  both  points,  q  is  not  the  focus  of  expeinsion.  Therefore, 
the  epipolar  line  of  pi  and  that  of  pa  intersect  in  q,  and  since  both  lines  also  intersect  in 
the  focus  of  expansion  they  must  coincide. 

In  the  perspective  case  the  transformation  can  be  determined  in  general  (up  to  a 
scale  factor)  by  a  correspondence  set  of  seven  points  [Longuet- Higgins  ID^l.  i  lai  and 
Huang  1984].  There  is  still  no  proof  for  whether  this  is  the  minimal  number.  Roach  and 
Aggarwal  [1979]  showed  by  counting  the  number  of  unknowns  that  five  points  may  be 
sufficient.  For  the  sake  of  completeness  we  review  in  Appendix  A  one  method  to  recover 
the  transformation  from  eight  corresponding  points  using  essentially  linear  operations. 
This  method  appeared  in  Tsai  and  Huang  [1984]. 

It  is  worth  noting  that  although  in  the  perspective  case  the  transformation  can  be 
recovered  from  two  images  the  computation  may  in  many  cases  be  unstable.  This  happens 
when  the  object  is  relatively  distant  from  the  camera,  in  which  case  depth  differ'^nces  are 
relatively  small  and  perspective  distortions  are  negligible,  and  when  the  depth  translation 
components  are  small,  in  which  case  the  epipolar  lines  are  nearly  parallel.  These  cases  are 
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essentially  similar  to  the  orthographic  case.  In  both  cases  the  transformation  obtained  is 
unstable,  and  a  third  image  may  be  required  to  recover  the  transformation  reliably.  The 
epipolar  lines,  however,  remain  stable  since  they  depend  mainly  on  those  components  of 
the  transformation  that  can  be  measured  reliably. 


2.3  Extensions 

In  the  previous  discussion  we  showed  that  rigidity  determines  the  correspondence  up  to 
epipolar  lines  and  that  the  position  of  points  along  these  lines  is  determined  by  their 
depth  values.  We  also  showed  that  the  epipolar  lines  can  be  recovered  from  a  small 
set  of  corresponding  points.  In  this  section  we  consider  two  additional  cases  to  which 
epipolar  lines  apply.  These  cases  include  images  of  objects  that  undergo  general  affine 
transformation  and  contour  images  of  rigid  objects  with  smooth  bounding  surfaces. 

An  affine  transformation  in  3-D  space  is  composed  of  a  general  linear  transformation 
followed  by  a  translation.  The  set  of  affine  transformations  contains,  in  axldition  to  all 
the  rigid  transformations,  also  stretch  and  shear.  Similar  to  the  rigid  case,  in  a  pair 
of  images  of  an  object  that  undergoes  an  affine  transformation,  corresponding  points 
lie  along  epipolar  lines.  This  is  true  both  when  the  images  are  orthographic  as  well  as 
perspective  projections  of  the  object.  This  follows  from  the  fact  that  in  proving  the 
results  above  we  never  used  the  special  properties  of  the  rotation  matrix. 

When  a  pair  of  images  is  given,  whether  the  objects  in  these  images  are  moving 
rigidly  or  whether  they  undergo  an  affine  (non  rigid)  transformations  is  indistinguish¬ 
able.  Basri  and  Ullman  [1991]  (see  also  [Poggio  1990])  showed  that  under  orthographic 
projection  the  set  of  images  of  a  rigid  object  is  contained  in  a  4-D  linezu-  space,  and  that 
additional  (quadratic)  constraints  distinguish  between  these  images  and  other  vectors  in 
this  space.  These  other  vectors  are,  in  fact,  images  obtained  by  applying  a  general  3-D 
affine  transformation  to  the  object.  The  quadratic  constraints  cannot  be  recovered  from 
two  images.  Hence,  it  is  impossible  to  distinguish  between  the  two  cases  when  only  two 
images  are  given.  A  similar  ambiguity  holds  under  perspective  projection.  It  is  worth 
noting  that  general  affine  transformations  approximate  the  way  moving  objects  are  ob¬ 
served  in  movies  from  different  viewpoints.  This  effect  is  known  since  1859  as  the  La 
Goumerie  Paradox  and  was  recently  discussed  by  Jacobs  [1991]. 

A  second  interesting  case  is  that  of  rigid  objects  with  smooth  surfaces.  The  bounding 
contours  of  such  an  object  are  generated  by  surface  patches  that  are  tangent  to  the 
line  of  sight.  These  patches  are  usually  referred  to  as  the  rim  [Koenderink  and  Van 
Doom  1979]  or  the  contour  generator  [Marr  1977]  of  the  object.  Since  the  surface  of 
the  object  is  smooth,  when  the  object  rotates  in  depth  a  new  set  of  surface  patches  that 
are  now  tangent  to  the  new  line  of  sight  replaces  the  original  rim,  generating  a  new  set 
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of  bounding  contours.  Establishing  correspondence  between  the  original  and  the  new 
bounding  contours  of  the  object  is  therefore  problematic,  since  the  contours  undergo  in 
addition  to  the  rigid  transformation  also  some  arbitrary  motion  that  depends  on  the 
exact  shape  of  the  object. 

Tracing  the  positions  of  these  contours  is  useful  for  any  shape  reconstruction  and 
object  recognition  scheme  that  is  based  on  contour  matching.  A  method  to  predict 
the  appearance  of  objects  with  smooth  bounding  surfaces  for  recognition  was  recently 
developed  [Basri  and  Ullman  1988].  The  method  assumes  an  orthographic  projection 
and  uses  the  3-D  curvature  of  points  along  the  contours  to  follow  their  change  in  position 
with  viewpoint.  The  curvature  vadues  were  computed  from  a  few  images  of  the  object  by 
matching  the  contours  in  these  images. 

The  next  observation  demonstrates  that  epipolar  lines  are  useful  in  determining  cor¬ 
respondences  between  orthographic  images  of  objects  with  smooth  bounding  surfaces. 
We  first  look  at  the  simpler  case  of  an  object  that  rotates  about  the  vertical  axis.  Let 
p  be  a  rim  point,  and  let  us  take  a  horizontal  section  of  the  object  through  p.  (Namely, 
if  p  =  {xQ,yQ,  zq)  we  consider  the  plane  y  =  yo-)  The  intersection  of  the  surface  of  the 
object  with  this  plane  forms  a  space  curve,  C.  When  the  object  rotates,  the  rim  point  p 
changes  its  position  on  the  object  along  C.  Denote  the  new  rim  point  by  p'.  Since  this  is 
a  rotation  about  the  K-axis,  the  epipolar  lines  in  both  images  are  horizontal.  Therefore, 
all  the  points  on  C  including  p  and  p'  are  projected  to  a  common  epipolar  line  in  both 
images. 

We  now  extend  this  observation  to  general  rigid  transformations.  Rotation  is  the 
only  component  that  affects  the  rim.  Translation  and  scaling  do  not  change  the  rim  and 
therefore  can  be  disregarded.  A  3-D  rotation  can  be  decomposed  into  three  successive 
rotations,  around  the  Z-,  Y-,  and  Z-axes.  (The  same  decomposition  used  in  Section  2.1.) 
As  we  did  in  Section  2.1,  we  apply  the  first  rotation  to  the  first  image,  and  (the  inverse 
of)  the  last  rotation  to  the  second  image.  Both  rotations  are  image  rotations,  and  they 
do  not  change  the  rim.  They  rotate  the  epipolar  lines  in  the  two  images  into  a  horizontal 
direction.  (See  Section  2.1.)  Therefore,  after  applying  these  rotations  we  obtain  the 
situation  described  above  for  the  simpler  case,  namely,  the  two  images  are  related  by  a 
rotation  about  the  vertical  axis,  and  their  epipolar  lines  are  horizontal.  Therefore,  the 
observed  position  of  the  rim  points  change  along  epipolar  lines. 

Figure  1  shows  the  epipolar  lines  in  two  orthographic  projections  of  a  VW  car.  Notice 
that  the  matching  between  silhouette  points  along  epipolar  lines  is  good  sdthough  they 
are  generated  by  smooth  surfaces. 
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Figure  1:  Epipolar  lines  in  two  orthographic  projections  of  a  VW  car.  Note  the  fact  that  corresponding 
points  lie  along  epipolar  lines.  The  silhouette  contours  deserve  special  attention  for  being  generated  from 
smooth  surfaces. 

3  Resolving  Point  Correspondence 

In  the  previous  section  we  have  shown  that  rigidity  alone  is  insufficient  to  solve  the 
correspondence  problem  uniquely  from  two  images.  It  divides  the  images  into  epipolar 
lines,  their  matching  is  determined  by  the  transformation  that  separates  the  images,  but 
the  correspondence  of  points  within  the  lines  cannot  be  resolved.  In  this  section  we 
examine  the  problem  of  establishing  correspondence  in  three  or  more  images.  We  show 
that,  similar  to  the  case  of  two  images,  the  correspondence  is  not  determined  uniquely. 
Additional  images,  however,  provide  constraints  that  can  be  used  to  solve  the  problem 
heuristically  (e.g.,  the  trinocular  stereovision  algorithm  [Yachida  1986]).  We  discuss 
several  additional  constraints  that  can  be  used  together  with  epipolar  lines  to  find  the 
correspondence  between  images.  These  methods  were  implemented  and  the  results  are 
presented  below. 

It  should  be  noted  that  the  use  of  epipolar  lines  to  determine  correspondence  is  limited 
to  those  regions  in  the  images  that  are  consistent  with  a  rigid  (or  affine)  transformation. 
When  the  images  contain  a  number  of  rigid  objects  moving  independently  each  of  the 
objects  may  determine  a  different  set  of  epipolar  lines.  A  segmentation  process  must  be 
applied  to  separate  these  objects  and  divide  the  images  into  regions  with  consistent  sets 
of  epipolar  lines.  We  shall  not  address  the  segmentation  problem  in  this  paper. 


3.1  Correspondence  from  Three  Images 


We  have  so  far  explored  the  establishment  of  correspondence  from  two  images.  We 
showed  that  the  correspondence  between  points  in  the  images  cannot  be  uniquely  re¬ 
solved  even  if  the  transformation  is  known.  We  now  address  the  following  question.  Can 
the  correspondence  be  resolved  when  three  or  more  images  are  considered?  Structure 
from  motion  theory  demonstrates  that  the  answer  to  this  question  is  not  trivial.  When 
correspondence  is  given,  under  orthographic  projection  two  images  axe  not  sufficient  to 
recover  the  transformation,  but  three  are  [Ullman  1979,  Hu«ing  and  Lee  1989].  The  cor¬ 
respondence  problem  is  nevertheless  different  from  the  structure  from  motion  problem. 
Point  correspondence  cannot  be  resolved  by  using  any  number  of  additional  images.  Yet, 
additional  images  provide  information  that  can  be  used  to  filter  out  less  likely  solutions. 

Proposition  3  establishes  that  point  correspondence  cannot  be  resolved  from  auiy 
number  of  images.  Let  Pi,  Pj, ...,  Pfc  be  k  images.  Let  {xi,yi),  1  <  i  <  Ar  be  the  locations 
of  a  point  p  =  {x,y,2)  in  P,.  (Assume  w.l.g.  that  zi  =  x  and  yi  =  y.)  Let  Ti,  2  <  i  <  k 
be  the  rigid  transformation  applied  to  p  in  Pj,  assuming  orthographic  projection. 

Proposition  3:  Given  T2,  the  set  of  possible  locations  of  p  in  P2,  ...,Pk  forms 

a  straight  line  in 

=  u  -f  zv 

where  u,v  €  are  constants. 

Proof:  This  is  obtained  simply  by  defining  u  =  (uj, ...,  u*)  and  v  =  (v2, ...,  v/^),  where 

u,-,Vj  6  are  the  corresponding  vectors  u  and  v  from  Proposition  la. 

This  proposition  implies  that  the  number  of  possible  correspondences  for  each  point 
is  infinite.  Every  possible  assignment  of  z  yields  to  a  different  location  of  the  points  in 
all  of  the  images.  An  equivalent  claim  can  be  made  in  case  of  perspective  projection. 

There  is,  however,  one  additional  consequence  to  this  proposition.  Determining  the 
correspondence  between  two  of  the  images  immediately  implies  the  correspondence  in 
2ill  other  images.  This  property  suggests  a  hypothesis-verification  heuristic  to  recover 
correspondence.  The  algorithm  first  selects  a  point  in  the  first  image,  hypothesizes  its 
correspondence  in  the  second  image,  computes  accordingly  its  position  in  the  third,  and 
then  verifies  its  appe<irance  in  the  predicted  location.  This  algorithm  is  used  in  Trinocular 
stereopsis  [Yachida  1986].  The  algorithm  can  be  defined  in  two  versions.  The  first  requires 
the  transformation  between  the  images.  It  predicts  the  position  of  points  in  the  third 
image  by  explicitly  computing  their  depth  values.  The  second  requires  the  epipolar  lines 
between  all  pairs  of  images.  It  predicts  the  position  of  points  in  the  third  image  by 
intersecting  epipolar  lines. 

Version  1. 
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1.  Select  a  point  p  =  {x,y)  €  Pi  and  find  its  epipolar  lines  A  in  P2. 

2.  For  all  candidates  qi, q„  along  A  compute  the  corresponding  depth  value  zi, 2„. 

3.  For  every  possible  depth  value,  zi, Zn,  compute  the  position  of  the  point  (z,  y,  z,) 
in  P3  and  verify  its  actual  appearance  at  this  location. 

Version  2. 

1.  Select  a  point  p  =  (x,y)  €  Pi,  and  find  its  epipolar  lines  A  in  P2  and  B  in  P3. 

2.  For  all  candidates  qi,  ...,q„  along  A  compute  their  epipolar  lines  Ci, C„  in  P3. 

3.  Intersect  each  of  the  lines,  Ci, ...,  Cn,  with  B  and  verify  the  actual  appearance  of  p 
in  these  locations. 

The  two  versions  of  the  algorithm  are  essentially  similiir.  The  first  version  uses  the 
transformation  between  the  images  to  compute  depth  values.  The  second  version  replaces 
this  computation  by  intersecting  epipolar  lines.  Note  that  the  transformation  can  be 
computed  from  three  images  using  four  non  coplanar  points  [Ullman  1979].  The  second 
version  can  be  used  only  if  the  epipolar  lines  C,  intersect  with  B.  The  meaning  of 
this  requirement  is  for  every  image  its  epipolar  lines  with  respect  to  the  other  images 
should  all  be  non  parallel.  (So  that,  if  we  take  for  example  P3,  its  epipolar  line  with 
respect  to  Pi  is  not  parallel  to  its  epipolar  line  with  respect  to  P2,  and  so  forth.)  This 
requirement  is  equivalent  to  requiring  the  transformations  to  be  independent.  Unless 
this  condition  is  met  structure-from-motion  algorithms  cannot  recover  the  transformation 
from  correspondence  [Huang  and  Lee  1989]. 

One  observation  following  this  algorithm  is  that,  since  epipolar  lines  are  defined  for 
pairs  of  images,  one  can  use  different  sets  of  anchor  points  to  recover  the  epipolar  lines  in 
each  of  the  pairs.  This  is  different  from  most  existing  structure  from  motion  algorithms, 
which  require  from  the  set  of  anchor  points  to  be  identical  in  all  three  images. 

Note  that  the  use  of  three  images  rather  than  two  is  reeisonable  since  three  images  are 
required  to  recover  structure  from  motion  under  orthographic  projection  [Ullman  1979, 
Huang  and  Lee  1989]  and  to  form  a  viewer-centered  representation  for  a  rigid  object 
[Ullman  and  Basri  1991]. 

The  algorithm  handles  both  rigid  objects  as  well  as  objects  that  undergo  general 
3-D  affine  transformations.  There  is,  however,  some  difference  between  the  two  cases. 
When  four  or  more  images  are  considered  certain  configurations  of  epipolar  lines  may  be 
consistent  with  some  affine  transformations  but  with  no  rigid  ones.  This  is  concluded  from 
[Basri  and  Ullman  1991],  since  three  images  are  necessary  to  determine  the  functional 
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constraints  that  distinguish  rigid  transformations  from  affine  ones.  These  constraints 
then  restrict  the  possible  configuration  of  the  epipolar  lines  in  larger  sets  of  images. 

Stability  problems  are  anticipated  in  applying  the  above  algorithm  when  contour 
pieces  are  tangential  to  the  epipolar  lines.  The  image  in  which  such  an  event  occurs  should 
be  used  in  this  case  as  the  third  image.  Notice  that  the  three  images  are  symmetric,  in 
the  sense  that,  the  algorithm  can  treat  them  in  any  order. 

It  should  be  stressed  that  both  versions  do  not  guarantee  uniqueness.  Occasionally 
candidates  may  be  found  consistent  with  till  three  images.  Further  pruning  between  these 
cauididates  is  required.  In  general  the  algorithm  gives  better  results  for  sparse  images 
than  for  dense  ones  and  for  images  with  arbitrarily  distributed  texture  than  for  images 
with  uniform  texture.  (Density  refers  here  to  the  number  of  points  actually  considered 
by  the  algorithm  relative  to  the  total  area  of  the  images.)  A  common  way  to  reduce  the 
density  of  an  image  is  to  consider  its  edge  map.  Edge  images  are  in  general  still  too  dense, 
and  a  naive  implementation  of  the  algorithm  would  fail  to  provide  a  unique  solution  for 
many  of  the  points.  To  avoid  this  problem  we  suggest  to  apply  this  matching  procedure  to 
edges  rather  than  to  points,  using  the  assumption  that  continuous  edges  tend  to  remain 
continuous  in  all  images.  Unlike  Ay  ache  and  Lustman  [1987],  our  implementation  is  not 
confined  to  straight  line  segments,  but  is  applied  to  arbitrarily  curved  ones.  We  exploit 
the  shape  variance  of  image  contours  to  discriminate  between  correct  amd  false  matches. 

The  modified  algorithm  was  implemented  and  run  on  real  images.  An  example  is  given 
in  Figures  2-4.  In  these  figures  correspondence  was  sought  between  three  edge  images  of 
a  VW  car  (Figure  2).  We  first  selected  a  contour  from  the  first  image.  Then  we  found 
all  the  contours  in  the  second  image  that  could  possibly  match  the  selected  contour. 
For  each  of  the  candidates  we  computed  their  location  in  the  third  image.  We  repeated 
this  process  for  a  number  of  contours.  Figure  3  shows  the  best  candidates  projected  to 
the  third  image.  Figure  4  shows  some  of  the  other  candidates  projected.  None  of  these 
candidates  match  an  actual  contour  (although  some  of  their  points  do).  The  results  of 
this  algorithm  were  used  to  create  object  models  for  recognition.  An  example  for  the  use 
of  these  models  can  be  found  in  [Ullman  and  Basri  1991). 


3.2  Alternative  constraints 

In  this  section  we  briefly  discuss  several  constraints  that,  combined  with  the  epipolar  lines, 
can  be  used  for  establishing  point  correspondence.  The  first  constraint  is  traditionally 
referred  to  as  the  ordering  constraint.  Most  objects  are  opaque.  Contour  segments  (and 
points)  on  such  objects  retain  their  spatial  order  from  different  viewpoints.  Therefore,  a 
contour  segment  B  that  lies  between  two  contour  segments,  A  and  C,  in  one  image  would 
in  general  match  some  contour  segment  B',  which  lies  between  the  two  corresponding 


15 


Figure  2:  Epipolar  lines  in  three  images  of  a  VW  car.  Every  image  contains  one  set  of  epipolar  lines 
against  each  of  the  other  two  images. 


Figure  3:  Application  of  the  three  images  algorithm  to  four  contour  pieces  selected  from  the  car  in 
Figure  2(a).  (The  selected  contours  include  the  roof  silhouette,  the  front  window,  the  rear  side  window, 
and  the  bottom  silhouette.)  (a)  The  best  prediction  found  by  the  algorithm  for  the  four  contour  pieces, 
(b)  This  prediction  overlapped  with  the  actual  (third)  image. 
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Figure  4:  Correspondence  candidates  that  were  not  selected  by  the  algorithm  because  their  predictions 
poorly  matched  the  third  image,  (a)  Prediction  of  false  candidates,  (b)  This  prediction  overlapped  with 
the  actual  image,  (c)  Another  prediction  of  false  candidates,  (d)  This  prediction  overlapped  with  the 
actual  image. 
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contour  segments,  A'  and  C'  respectively.  (Notice  that  right,  left,  up,  and  down  can  still 
change,  as  in  the  case  of  a  180®  rotation  around  the  line  of  sight.) 

Other  cues  that  may  be  helpful  to  resolve  the  correspondence  are  parallelism  and 
symmetry.  If  a  pair  of  contour  segments  are  parallel  or  symmetrical  in  one  image  their 
corresponding  contour  segments  in  the  second  image  are  often  parallel  or  symmetrical 
respectively.  Resolving  the  correspondence  for  one  segment  would  therefore  indicate 
a  solution  for  the  other  segment.  It  is  worth  mentioning,  however,  that  perspective 
projection  does  not  maintain  parallelism,  and  that  symmetrical  components  often  appear 
skewed  in  the  image  under  both  projections.  Incorporating  these  cues  into  a  process  of 
resolving  the  correspondence  may  therefore  be  fairly  difficult. 

Epipolar  lines  can  be  used  to  improve  the  correspondence  achieved  under  aperture 
conditions.  Under  these  terms  matching  between  contours  is  given  along  a  direction 
perpendicular  to  the  contours  [Marr  and  Ullman  1981].  Common  techniques  to  correct 
the  matching  use  iterative  computation  to  maximize  the  smoothness  of  the  flow  [Hildreth 
1984],  use  sequences  of  images  to  find  a  rigidly  consistent  solution  [Ullman  1984],  or 
compute  a  smooth,  locally  affine  solution  [Burt  et  al  1990,  Bachelder  and  Ullman  1991]. 
The  epipolar  line  technique  offers  an  exact  solution  to  the  aperture  problem  for  full  rigid 
motion  that  is  both  computationally  simple  and  resolves  the  correspondence  for  as  few 
as  two  images. 

Figure  5  compares  the  matching  obtained  under  aperture  conditions  with  the  match¬ 
ing  obtained  using  epipolar  lines  for  two  car  silhouettes.  It  should  be  noted  that  in 
general  the  aperture  problem  is  2«sociated  with  short  range  motion  applications.  In  this 
case  the  computation  of  epipolar  lines  tends  to  be  unstable.  One  way  to  overcome  this 
problem  is  to  recover  the  epipolar  lines  for  a  sequence  of  images,  such  that  the  difference 
between  eaich  pair  of  consecutive  images  is  small,  but  the  overall  transformation  accu¬ 
mulated  along  the  sequence  is  large.  Alternatively,  if  two  “distant”  images  are  provided 
the  images  may  first  be  roughly  aligned  before  aperture  matching  can  take  place. 


4  Summary 

The  recovery  of  shape  from  a  motion  sequence  requires  in  general  establishing  corre¬ 
spondence  between  the  points  in  the  images.  This  task  is  particularly  difficult  when  the 
images  are  taken  from  viewpoints  that  are  relatively  distant  from  one  another,  condi¬ 
tions  referred  to  as  “long  range  motion”.  Establishing  point  correspondence  under  these 
conditions  is  important  for  constructing  both  object-centered  as  well  as  viewer-centered 
representations  for  object  recognition.  Such  representations  tend  to  be  more  stable  as  the 
images  from  which  they  are  constructed  are  separated  by  relatively  large  transformations. 
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Information  about  the  shape  of  objects  and  the  transformations  they  undergo  can  be 
used  to  guide  the  matching  process.  In  this  paper  we  reviewed  the  constraints  imposed 
on  the  correspondence  by  rigid  transformations  and  extended  them  to  include  images 
of  objects  that  undergo  general  3-D  affine  transformations  as  well  as  rigid  objects  with 
smooth  surfaces.  In  all  these  cases  the  images  are  divided  into  epipolar  lines,  their 
correspondence  is  determined  by  the  transformation,  but  the  correspondence  of  points 
within  the  lines  cannot  be  recovered.  The  epipolar  lines  can  be  computed  from  a  small 
set  of  anchor  points. 

The  correspondence  is  not  determined  uniquely  even  when  three  or  more  images  are 
considered.  Additional  images  can  be  used,  however,  in  a  heuristic  algorithm  to  determine 
point  correspondence.  Such  an  algorithm  is  the  trinocultir  stereovision  algorithm  [Yachida 
1986],  which  is  designed  to  work  with  sparse  images  and  in  the  absence  of  uniform  texture. 
We  extended  this  algorithm  to  handle  arbitrarily  curved  edge  images  and  applied  it  to 
images  of  natural  objects.  We  discussed  the  use  of  other  constraints  such  as  ordering, 
p<irallelism,  and  symmetry  in  solving  the  correspondence  problem.  Finally,  we  showed 
that  epipolar  lines  can  be  used  to  improve  matching  obtained  under  aperture  conditions. 
The  techniques  described  in  this  paper  were  implemented  and  used  to  construct  viewer- 
centered  models  for  object  recognition. 
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Appendix  A 

In  this  appendix  we  show  how  the  transformation  can  be  recovered  (up  to  a  scale  factor) 
from  two  images  under  perspective  projection  using  eight  corresponding  points.  This  is 
a  repetition  of  the  method  presented  in  [Tsai  and  Huang  1984). 

Let  Pi  and  Pj  ^wo  perspective  images  of  a  rigid  object  obtained  by  a  rotation  R 
and  a  translation  t  in  3-D  space.  Denote  rx,  r^,  and  Tz  the  three  row  vectors  of  R,  and 
{tx,ty,tx)  the  three  translation  components.  Note:  we  can  substitute  R  in  this  analysis 
with  any  3x3  matrix. 

We  define 


tytx  tjcfy 
tyr^  4*  tz^y 
tz^x  -  txTz 

Note  that  a,  b,  and  c  are  vectors  in  TV. 
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Let  (a:,-,y,)  €  P\  and  (xj,  y,')  €  Pj  be  a  pair  of  corresponding  points,  denote  p,  = 
(xi,  Hi,  1),  the  following  equations  holds 

api  =  bp.x-  -  cpij/' 

When  anchor  points  are  given,  p,,  xj,  and  y\  aire  known,  while  the  vectors  a,  b,  and  c  are 
not.  These  vectors  contain  nine  components,  and  the  equation  is  linear  and  homogeneous 
in  their  components.  Therefore,  a,  b,  and  c  can  be  recovered  up  to  a  scale  factor  using 
eight  ajichor  points.  Once  the  system  is  solved  we  can  recover  the  parameters  of  the 
transformation  (up  to  a  scale  factor  in  the  translation  components)  using  the  following 
identities 

a"  = 

b^  =  + 

c"  = 

And 

ab  =  txtz 
ac  “ 

be  tjpty 

The  translation  components  are  therefore  given  (up  to  a  scale  factor)  by 

‘I  =  i(a'-b=  +  c^) 

<;  =  5(a^+b=-c=) 

t]  =  i(-a'  +  b^+c') 

And  the  rotation  matrix  can  be  retrieved  from 

/  ty  -tx  0  a  \ 

P  =  0  tx  ty  I 

\tz  0  -tx  J  V  c  / 

Note  that  the  rotation  obtained  is  not  scaled. 
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